Main Features of Interest
Interestingly, the relationship between alcohol and rating doesn’t seems to be linearly positive. This leads me to think that there must be other factors that cause this parabola curve between alcohol and quanlity rating. Anyway, for good wines (quality rating above 5), the higher the alcohol level is, the better the rating is. For example, the median alcohol level among wines that are rated 8 is 12%, a quarter more than the 9.5% among wines that are rated 5.
## group: 3
## NULL
## --------------------------------------------------------
## group: 4
## vars n mean sd median trimmed mad min max range skew kurtosis se
## 1 1 163 10.15 1 10.1 10.08 1.04 8.4 13.5 5.1 0.7 0.15 0.08
## --------------------------------------------------------
## group: 5
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 1457 9.81 0.85 9.5 9.71 0.74 8 13.6 5.6 1.08 1.07
## se
## 1 0.02
## --------------------------------------------------------
## group: 6
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 2198 10.58 1.15 10.5 10.52 1.33 8.5 14 5.5 0.4 -0.72
## se
## 1 0.02
## --------------------------------------------------------
## group: 7
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 880 11.37 1.25 11.4 11.42 1.33 8.6 14.2 5.6 -0.3 -0.56
## se
## 1 0.04
## --------------------------------------------------------
## group: 8
## vars n mean sd median trimmed mad min max range skew kurtosis se
## 1 1 175 11.64 1.28 12 11.78 1.19 8.5 14 5.5 -0.89 0.01 0.1
## --------------------------------------------------------
## group: 9
## NULL
Alcohol level reinforces acidity but the chart below shows a non-linear relationship between the two. This explains why in Section I we saw a small correlation at -0.19. I then divided alcohol level into five groups with equal intervals. The median of volatile acidity among wines with a high level of alcohol (13-14.2%) is 0.35, which is significantly higer than alcohol group between 10.5-11.7% (median = 0.24).
## (7.99,9.24] (9.24,10.5] (10.5,11.7] (11.7,13] (13,14.2]
## 841 1723 1382 789 138
## group: (7.99,9.24]
## vars n mean sd median trimmed mad min max range skew kurtosis se
## 1 1 841 0.28 0.1 0.27 0.27 0.07 0.1 0.82 0.71 1.46 3.25 0
## --------------------------------------------------------
## group: (9.24,10.5]
## vars n mean sd median trimmed mad min max range skew kurtosis se
## 1 1 1723 0.28 0.1 0.26 0.27 0.09 0.08 1 0.92 1.7 5.88 0
## --------------------------------------------------------
## group: (10.5,11.7]
## vars n mean sd median trimmed mad min max range skew kurtosis se
## 1 1 1382 0.26 0.09 0.24 0.25 0.07 0.09 0.96 0.88 1.79 6.87 0
## --------------------------------------------------------
## group: (11.7,13]
## vars n mean sd median trimmed mad min max range skew kurtosis se
## 1 1 789 0.3 0.1 0.29 0.29 0.07 0.08 1.1 1.02 1.45 6.36 0
## --------------------------------------------------------
## group: (13,14.2]
## vars n mean sd median trimmed mad min max range skew kurtosis se
## 1 1 138 0.37 0.12 0.35 0.36 0.1 0.15 0.78 0.64 0.87 1.18 0.01
Free sulfur dioxide decreases when alcohol level rises. For example, among wines with alcohol level at 7.99-9.24%, the median value of free sulfur dioxide is 42 mg/l, 50% higher than the median free sulfur dioxide (28 mg/l) found in the alcohol group between 13-14.2%.
## group: (7.99,9.24]
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 841 41.54 15.77 42 41.37 16.31 5 128 123 0.32 0.94
## se
## 1 0.54
## --------------------------------------------------------
## group: (9.24,10.5]
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 1723 37.55 17.98 36 36.75 19.27 3 138.5 135.5 0.57 0.64
## se
## 1 0.43
## --------------------------------------------------------
## group: (10.5,11.7]
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 1382 31.97 15.26 31 31.06 14.83 2 131 129 0.96 2.67
## se
## 1 0.41
## --------------------------------------------------------
## group: (11.7,13]
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 789 30.47 12.71 30 30 11.86 3 96 93 0.77 2.52
## se
## 1 0.45
## --------------------------------------------------------
## group: (13,14.2]
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 138 27.88 12.14 28 27.35 13.34 3 65 62 0.4 -0.07
## se
## 1 1.03
Although the feature selection placed free sulfur dioxide and volatile acidity among the top three important features, charting against quality rating doesn’t show strong relatinship, which could be masked by other factors.